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ABSTRACT 

Recent research results strongly suggest that the 
theoretical problems of change measures have limited practical 
significance for measuring individual growth, and it is important to 
determine whether this is also the case for measuring school impact. 
Accordingly, in this study artifical data were used to assess the 
correlation between several estimates of average student change in 
various schools and the "true»» impact of the same schools. Because it 
seems desirable for artificial data to resemble real data, the 
computer procedure was designed to reproduce selected aspects of the 
Educational Testing Service Growth Study and of the Project TALENT 
study of high schools in the U.S. Results indicate that all estimates 
involving pretest-posttest differences measure school impact with 
reasonable accuracy. It is important to measure change over the 
entire course of learning, however, and not just over the later 
stages of learning. The correlations between change scores and other 
school characteristics reflect with reasonable accuracy the 
relationships between those characteristics and impact, but will be 
large only when the underlying relationships are substantial. Simple 
gain scores measure the true situation about as accurately as other 
change estimates, are easier to compute, and probably are more 
meaningful to nonresearchers. (Author/JH) 
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Introductory Statuiuunt 

Thu Cunter i'or Social Organisation o£ Schools has two pritnary 
objuctivus: to doviUop a Hciontiiic knowledge o£ how schools afl'act 
thoir students , and to use this knowledge to develop better school 
practices and organisation. 

Tlie Center works through three programs to achieve its objectives* 
The Schools and Maturity program is studying the effects of school, 
taniily, and peer group experiences on the development of attitudes 
consistent with psychosocial n^atuvity* The objectives are to formu- 
late, assess, and research important educational goals other than 
traditional academic achievement. The School Organization program is 
currently concerned with authority-control structures, task structures, 
reward systems, and peer group processes in schools. The Careers 
program (formerly Careers and Curricula) bases its work upon a theory 
of career development. It has developed a self-administered vocational 
guidance device and a self-directed career program to promote vocational 
development and to foster satisfying curricular decisions for high 
school, college, and adult populations. 

This report, prepared by the School Organization program, examines 
methods of assessing the effectiveness of schools and educational 
programs in promoting educational growth of students. 
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Abstract: 

Artificial data wcaru usud to assess the corrQlation botweun 
suvoral estimatus of average studunt change in various schools and 
the ^'truu** impact of those schools. Results indicate that all 
estimates Involving pretest-posttest differences measure school 
Impact with reasonable accuracy. It is important to measure change 
over the entire course of learning) however, and not Just over the 
later stages of learning, the correlations between change scores and 
other school characteristics reflect with reasonable accuracy the 
relationships between those characteristics and Impact, but will be 
large only when the underlying relationships are substantial. 
Simple gain scores measure the true situation about as accurately as 
other change estimates, are easier to compute, and probably are more 
meaningful to non-researchers. 



Incrocluctlon 

A basic purpost^ uf education is to protuote duslrablo change or 
growth in thu oUucutlonal attainni«^nt of studants. It follows that 
schools or other educational programs should bo evaluated largely on 
their effectiveness in promoting such change. There are many theoretical 
problems In estimating student change from scores on standard tests of 
educational attainmenc, however, and these problems arc heightened in 
the typical situation where the students entering various schools differ 
systematically (Astin and Panos, 1971; Cronbach and Furby, 1970; Harris, 
1963; Herriott and Muse, 1973; Klittgard and Hall, 1973; O'Connor, 1972). 

It has been difficult to assess the practical importance of these 
theoretical problems because true change scores are unknown in most 
longitudinal research. Recently, a computer procedure was developed to 
provide artificial data in which these true change scores are known 
(Richards, Karweit, and Prevatt, in press). When such artificial data 
wore used to compare several statistical techniques for assessing change 
in individual students (Richards, 1974), the results indicated that 
individual change is measurud with reasonable accuracy by all techniques 
that involve the difference between the pretest and the posttest. In 
particular, the simple difference between the pretest and the posttest 
is about as accurate as other change estimates, such as regressed gain 
scores, and is much easier to compute than other estimates. These trends 
hold even when students are assigned nonrandomly to schools that differ 
in their Impact on students. 
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Thuso rosuUs strongly suggust that the theoretical problunis oi 
change moasuras havu limitod practical signii'icauce i:or measuring 
individual growth, and it is important to dotuminca whuther this is also 
the case for measuring school impact. Accordingly, in this study artifi- 
cial data were used to assess the correlation between several estimates 
of average student change in various schools and the "true" impact of 
the same schools. This study is stated in the context of education, but 
tha procedures for generating data and measuring change are abstract. 
Therefore, the results should generalize to many situations where one 
wishes to compare the impact of varying social interventions. 

Method 

Simulation Procedure . Because it seems desirable for artificial 
data to resemble real data as closely as possible, the computer procedure 
was designed (Richards, £t £l . , in press) to reproduce selected aspects 
of the ETS Growth Study (Hilton, Beaton, and Bower, 1971) and of the 
I'roject TALENT study of high schools in the United States (Flanagan, 
et al . , 1962). In the ETS Growth Study, students were assessed initially 
with a measure of academic potential (SCAT) and a measure of educational 
attainment (STEP). Subject to the usual attrition in longitudinal 
research, the educational attainment of these students was reassessed 
on three subsequent occasions. Project TALENT provided intercorrelations 
among a variety of community, school, and student characteristics for 
a representative sample of U. S. high schools. 



The coi\)i)Uter proci^dure gunurates scorus for inUividual student;^ 
that strive to ruproduce tho inuauSi standard duvlatlons, and intercor- 
ruiatious obtainod in thu ETS Growth Study. Thu studi^nt's score on 
accidumlc potuntlal Is generated first and used to derive that student's 
score on initial academic attainment. Then gain scores are generated 
and added to yield subsequent attainment scores. Trut standard scores 
are generated initially, then the appropriate amount of random error is 
added to each score and the scores are transformed to the metric ot the 
ETS Growth Study observed scores. This simulation procedure closely 
reproduces the BTS Growth Study results (Richards, 1974). 

The simulation procedure permits the investigator to assign students 
to schools either randomly or nonrandomly. When students are assigned 
nonrandomly, the program strives to reproduce the average correlation 
between community per capita income and average academic potential of 
students estimated from Project TALENT results (P= .1^4). The ratio 
of between schools variance to total variance also simulates the Project 
TALENT ratio. 

The simulation procedure assumes that community per capita income 
determines school resources, and that school resources in turn determine 
school impact. A review of Project TALENT results suggested an average 
correlation of approximately .25 between community income and those 
school resources commonly assumed to facilitate student growth, so the 
simulation procedure strives to reproduce this relationship between 
income and resources. Community income is drawn randomly froB. a normal 
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distribution, anU it iH asiiumed that school rasourcos and school impact 
also are nomuiUy dlstrlbutod. 

'n»«re is little empirical basis for estimating either the correla- 
tion between resources and impact or the extent to which schools vary 
in impact. Therefore, the simulation procedure allows the investigator 
to specify both the correlation between resources and impact and the 
standard deviation of the impact variable. This standard deviation is 
specified in the form of a number between 0 and I, When the standard 
deviation is .10, the average growth values used in generating scores 
are equal to the average growth scores obtained in the BTS study for a 
school with average impact, and are 10% higher than the ETS averages for 
a school one standard deviation above the mean on impact. (The simulated 
data appear o meet the assumptions for this manipulation even if the 
ETS data do not.) 

Gain scores for individuals are generated according to the following 
principle: 

t m d 

where G^ is total (true) growth, G^ is average (or mean) growth (i.e., 

the parameter estimated from the KRS data) and G, is a deviation from 

a 

this average that represents individual differences in true growth. The 
total gain score is added to the pretest score to yield the posttest 
score, and the posttest score then becomes the pretest for the next 
growth interval. For each growth interval, the pretest is one of the 
elements entering a multiple regression formula used to generate the 
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values. The correlatiouB between pretest and growth bevQme Increasingly 
negative for successive intervals (Richards, 1974). 

In generating scores , the mean growth parameters for the three 
intervals are adjusted for school impact, and no other changes are made. 
Consequently, the adjusted mean growth parameters frequently will not be 
equal to the obtained average true growth scores for a given school. 
A school with above average impact will have higher than average mean 
growth parameters and therefore higher than average true posttest scores, 
Tliese become higher than average true pre test scores for subsequent 
learning Intervals, and these higher pretest scores make an increasingly 
negative contribution in the computation of subsequent true growth scores. 
Ttie averages of the obtained true growth scores for that school will tend 
to be lower than the adjusted mean growth parameters. Similarly, the 
averages of the obtained true growth scores will tend to be higher than 
the adjusted mean growth parameters for a school with below average impact. 
Table 1 presents a simplified illustration of these trends for five 
hypothetical schools that are average in every respect except for differing 
in impact. Because other parameters besides pretest score are involved 



Insert Table 1 About Here 

in generating scores (Richards, 1974), it is conceivable that a school 
with below average impact (and therefore below average adjusted mean 
growth parameters) will have higher average obtained true growth scores 
than a school with above average impact. This is especially true when 
students are assigned to schools nonrandomly. 

Id 



Data So ts. Six indupendant sots of slmulatyd data were genoratod 
for thQ present study. lu each sut studuuts wara assignad to 100 schools 
or treatmants. The number of studunts per school varied randomly with 
mean » 150 and standard deviation « 13. 'rhorofore, the total number of 
students in each of these sex sets was approximately 13,000. 

In three of these sets students were assigned randomly to schools 
or treatments, and in the other three sets students were assigned 
nonrandomly. Under each type of assignment, simulated data were generated 
for three different assumptions about the relationship between school 
resources and school impact. Specifically, it was assumed that school 
resources account for 5?o, 20?o, or 807o of the variance in school impact 
(corresponding to correlations of .2236, .4472, or .8944). 

Finally, in all six sets the standard deviation of the impact variable 
was set at .10. At approximately this magnitude two simulated schools 
one standard deviation apart on impact (with N*s - 130) will differ at 
the .05 level when compared with respect to educational growth between 
successive occasions. 

Change Measures , A wide variety of change measures have been proposed 
(Cronbach and Furby, 1970), but recent results suggest that most of these 
measures yield essentially equivalent results (Richards, 1974) « Accord- 
ingly, this study used only four measures of change, each represent-^.ng 
a different approach to estimating change. These change estimates 
included: 

1. Posttest score. 

2. Posttest score adjusted for initial academic potential. This 
change estimate is the difference between posttest score and 



p^^dlcted posttest score, using initial academic potential as 
the predictor. (The prediction equation for each data set was 
baaed on the observed relationships in that set.) Thus, this 
technique resembles analysis of covariance with academic poten- 
tial treated as the covariate. 

3. Raw gain. This change score is the simple difference between 
pretest score and posttest score. 

4. Raw residual gain. This estimate is the difference between 
posttest score and predicted posttest score, using pretest 
score as the predictor. 

Rt-sults 

To facilitate comparison with the earlier study of individual change 
estimates (Richards, 1974) the Hvut step in the data analysis was to 
compute the correlations between average estimated change scores for 
various schools and average true change scores for the same schools. An 
unresolved question is whether it is better to compute change scores for 
individual students and then average within schools or to compute change 
scores from school means (Dyer, Linn, and Patton, 1969), so both procedures 
were used to estimate change in this analysis. Table 2 summarizes the 
results. 

Insert Table 2 About Here 

These results seem quite consistent with the results of the earlier 
study of individual change estimates (Richards, 1974). Change is estimated 
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TOOst accurately by techniques that Involve tho difference between the 
protest and the posttust, and these techniques seem equally accurate 
(I.e., raw gain is just as accurate as residual gain). For the most 
part, there is little difference between change estimates based on 
individual students and change estimates based on school means. In a 
few cases estimates based on school means have a clear advantage and 
these estimates are also easier to compute, so subsequent analyses in 
this paper involve only estimates based on school means. 

The next analysis evaluated the accuracy of these change estimates 
as measures of school impact. Table 3 summarizes the correlations betweo 
impact and various change estimates. For comparative purposes, this 
table also summarizes the correlations between impact and average true 
growth scores. 

Insert Table 3 About Here 

These results indicate that change estimates can be quite effective 
in rank ordering schools with respect to their impact even when students 
arc assigned to schools nonrandomly. The simple gain scores again were 
just as accurate as the residual gain scores and, as Cronbach and Furby 
(1970) point out, posttest score measures impact adequately when students 
aru assigned to treatments randomly. 

The result? also indicate that it is important to measure change 
over an appropriate interval. Adjusted potttest scores, simple gain 
scores, and regressed gain scores all rank ordered schools accurately 
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when they Involved change from initial status, but none of the measures 
were particularly effective in rank ordering schools when they involved 
growth in the later stages of the learning process. This ineffective- 
ness reflected the true situation, because It is also characteristic 
of the true growth scores. The ETS data resemble other longitudinal or 
learning data in a number of respects (Richards, 1974), so these findings 
about when to measure change should have considerable generalizabllity. 

The final question examined in this study involves the relationships 
among these change measures and the school characteristics that cause 
variations in impact. Such results are more typical of what would be 
obtained in a "real" longitudinal study. Table 4 summarizes the relevant 
correlations between resources and change. The magnitudes of these 
correlations clearly follow the underlying relationship between resources 
and impact, but are somewhat lower. The smaller magnitude of these 

Insert Table 4 About Here 

correlations perhaps is partly the consequence of unreliability of the 
change scores, but also appears to reflect the imperfect correspondence 
between school impact and average true change. The results again indicate 
that raw gain is about as accurate as any other change estimate, reempha- 
size the importance of measuring change over an appropriate interval, 
and suggest that the correlation between a school characteristic and 
school impact must be reasonably substantial before any change score 
will reveal the relationship. 



Discussion 

Tlieoretlcal treatments of the Issues considered In this paper have 
emphasized the theoretical difficulties of using change scores in general 
' and of using simple gain scores in particular. The results of this study » 

like those of the earlier study of individual change (Richards, 1974) » 
suggest that the practical importance of these theoretical difficulties 
may have been exaggerated. It appears that change estimates over an 
appropriate interval (e.g., the entire course of ^s^rning, not just the 
later stages) do measure school impact with reasonable accuracy* The 
correlations between change scores and other school characteristics 
reflect with reasonable accuracy the relationships between the same char- 
acteristics and school impact, but consequently will be large (or "signi- 
ficant") only when the underlying relationship is fairly substantial. 
These conclusions appear relatively unaffected by random vs. nonrandom 
assignment of students^ (although this finding could change for more severe 
nonrandomness) , or by whether change measures involve individual scores, 
or school means. ^ 

Insensitivity to weak relationships almost certainly is character- 
istic not just of change scores, but of all statistical procedures that 
might be applied to these data, and simple gain scores appear to reflect 
the true situation about as accurately as any other estimate of change 
or impact. Simple gain scores also are easier to compute than most other 
estimates and probably are more meaningful to non- re searchers. Therefore, 
the results of this study suggest that it often may be quite appropriate 

^It should be emphasized that these conclusions apply to true longitudinal 
designs and this study should not be used to justify such procedures as 
measuring impact by educational attainment adjusted for a test of academic 
potential administered at the same time. 
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to con^pare educational programs on the basis of simple pretest^posttest 
differences* 

The discrepancy between this study and earlier theovetical treat*- 
ments may perhaps best be resolved in terms of degree of concern about 
**Type I*' errors* That is, theoretical treatments usually seem to assume 
that educational treatments do not differ on impact and emphasize the 
possibility that use of change scores » particularly simple gain scores, 
will lead to the false conclusion that they do differ. Certainly this 
possibility cannot be ignored, especially when the students assigned to 
various treatments differ considerably (Astin and Panos, 1971; Cronbach 
and Furby, 1970), and certainly it is possible to propose hypothetical 
situations where change scores could be misleading or confusing^ especially 
if one has a taste for paradoxes (Lord, 1967). This study, on the other 
hand, assumed that schools do differ on impact and asked how accurately 
change scores describe these differences* The answer to this question 
appears much more favorable to change scores. Indeed, the results 
suggest that when one uses change scores over an inappropriate Interval 
in a correlational study there may be a greater danger of the false 
conclusion that schools do not differ with respect to impact than of the 
false conclusion that schools do differ. 

Cronbach and Furby (1970) correctly point out that some of the 
questions to which change scores might be applied could be answered more 
directly with such techniques as partial correlation. The advantages of 
such techniques are that they are more direct than change scores, however, 
not that they are more accurate, nor that they require less statistical 
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sophistication^ The results of this study lend support to the Investigator 
who prefers to use change scores for reasons of convenience or ease of 
understanding. 

Finally, the results of this study again illustrate the usefulness 
of simulation techniques for investigations of longitudinal methodology. 
It would be impossible to investigate the questions considered in this 
study with **real** longitudinal data because the investigator would have 
no way of knowing either the true individual growth scores or the true 
school impact scores. At best one could compute the intercorrelations 
among different estimates of change (Dyer, et al. , 1969). With simulated 
data it was easy to compute the correlations between true scores and the 
different estimated scores. It would also be easy to extend the simulation 
procedures to the situation where considerable attrition of subjects occurs, 
to the situation where one has only pseudo-longitudinal data (e.g., test 
scores for Occaiiions JL and 2 obtained from different groups of students 
in the same school), or to different models for growth. Thus, simulation 
techniques offer considerable promise for refining our knowledge about 
when various procedures for analyzing longitudinal data are appropriate. 
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